Adaptive Regret Minimization in Bounded-Memory Games
نویسندگان
چکیده
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a two-player bounded memory-m game, both players simultaneously play an action, observe an outcome and receive a reward. The reward may depend on the last m outcomes as well as the actions of the players in the current round. The standard notion of regret for repeated games is no longer suitable because actions and rewards can depend on the history of play. To account for this generality, we introduce the notion of k-adaptive regret, which compares the reward obtained by playing actions prescribed by the algorithm against a hypothetical k-adaptive adversary with the reward obtained by the best expert in hindsight against the same adversary. Roughly, a hypothetical k-adaptive adversary adapts her strategy to the defender’s actions exactly as the real adversary would within each window of k rounds. Our definition is parametrized by a set of experts, which can include both fixed and adaptive defender strategies. We investigate the inherent complexity of and design algorithms for adaptive regret minimization in bounded memory games of perfect and imperfect information. We prove a hardness result showing that, with imperfect information, any k-adaptive regret minimizing algorithm (with fixed strategies as experts) must be inefficient unless NP = RP even when playing against an oblivious adversary. In contrast, for bounded memory games of perfect and imperfect information we present approximate 0-adaptive regret minimization algorithms against an oblivious adversary running in time n. ar X iv :1 11 1. 28 88 v1 [ cs .G T ] 1 1 N ov 2 01 1
منابع مشابه
Monte Carlo Sampling for Regret Minimization in Extensive Games
Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome samplin...
متن کاملAdaptive Bandits: Towards the best history-dependent strategy
We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Θ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model θ∗ ∈ Θ. The regr...
متن کاملRegret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents
In two-player zero-sum games, if both players minimize their average external regret, then the average of the strategy profiles converges to a Nash equilibrium. For n-player general-sum games, however, theoretical guarantees for regret minimization are less understood. Nonetheless, Counterfactual Regret Minimization (CFR), a popular regret minimization algorithm for extensiveform games, has gen...
متن کاملRegret minimization in repeated matrix games with variable stage duration
Regret minimization in repeated matrix games has been extensively studied ever since Hannan’s (1957) seminal paper. Several classes of no-regret strategies now exist; such strategies secure a longterm average payoff as high as could be obtained by the fixed action that is best, in hindsight, against the observed action sequence of the opponent. We consider an extension of this framework to repe...
متن کاملIterated Regret Minimization in Game Graphs
Iterated regret minimization has been introduced recently by J.Y. Halpern and R. Pass in classical strategic games. For many games of interest, this new solution concept provides solutions that are judged more reasonable than solutions offered by traditional game concepts – such as Nash equilibrium –. In this paper, we investigate iterated regret minimization for infinite duration two-player qu...
متن کامل